- Course overview
- Introduction to data science
- Some examples
- Pipes in
R - Visualization with
ggplot2
Data Science and Predictive Machine Learning
Rggplot2When figures and other external sources are shown, the references are included when the origin is known.
You can find all materials at the following location:
The MS Teams environment can be found here. Participants can join the meeting online here
Dark data scientistExpertise: Missing data theory, statistical programming, computational evaluation
| Week # | Focus |
|---|---|
| 1 | Intro, modeling and regression |
| 2 | Classification and Crossvalidation |
| 3 | Regularisation |
| 4 | Support Vector Machines and non-linear predictions |
Exploratory Data Analysis:
Describing interesting patterns: use graphs, summaries, to understand subgroups, detect anomalies, understand the data
Examples: boxplot, five-number summary, histograms, missing data plots, …
Supervised learning:
Regression: predict continuous labels from other values.
Examples: linear regression, support vector machines, regression trees, …
Classification: predict discrete labels from other values.
Examples: logistic regression, knn, …
How do you think that data analysis relates to:
People from different fields (such as statistics, computer science, information science, industry) have different goals and different standard approaches.
data analysis.In this course we emphasize on drawing insights that help us understand the data.
Source: wikimedia commons and MIMP summerschool slide 28
Challenger space shuttle - 28 Jan 1986 - 7 deaths
When high risk decisions are at hand, it paramount to analyze the correct data.
When thinking about important topics, such as whether to stay in school, it helps to know that more highly educated people tend to earn more, but also that there is no difference for top earners.
Before John Snow, people thought “miasma” caused cholera and they fought it by airing out the house. It was not clear whether this helped or not, but people thought it must because “miasma” theory said so.
If we know flu is coming two weeks earlier than usual, that’s just enough time to buy shots for very weak people.
The above examples have in common that data analysis and the accompanying visualizations have yielded insights and solved problems that could not be solved without them.